Confidence measure based language identification

نویسندگان

  • Florian Metze
  • Thomas Kemp
  • Thomas Schaaf
  • Tanja Schultz
  • Hagen Soltau
چکیده

In this paper we present a new application for confidence measures in spoken language processing. In today’s computerized dialogue systems, language identification (LID) is typically achieved via dedicated modules. In our approach, LID is integrated into the speech recognizer, therefore profiting from high-level linguistic knowledge at very little extra cost. Our new approach is based on a word lattice based confidence measure [3], which was originally devised for unsupervised training. In this work, we show that the confidence based language identification algorithm outperforms conventional score based methods. Also, this method is less dependent on the acoustic characteristics of the transmission channel than score based methods. By introducing additional parameters, unknown languages can be rejected. The proposed method is compared to a score based approach on the Verbmobil database, a three language task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Identification With Confidence Limits

A statistical classification algorithm and its application to language identification from noisy input are described. The main innovation is to compute confidence limits on the classification, so that the algorithm terminates when enough evidence to make a clear decision has been made, and so avoiding problems with categories that have similar characteristics. A second application, to genre ide...

متن کامل

Offline Language-free Writer Identification based on Speeded-up Robust Features

This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...

متن کامل

Integrating Complementary Features with a Confidence Measure for Speaker Identification

This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a sc...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

New Confidence Measures for Statistical Machine Translation

A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for mach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000